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METHOD AND APPARATUS FOR 
DISPLAYING INFORMATION 

RELATED APPLICATIONS 

5 

The application is a continuation of application U.S. Ser. 
No. 11/890,286, filed on Aug. 3, 2007, now abandoned, 
which is a continuation of application U.S. Ser. No. 1 1/108, 
623, filed on Apr. 18, 2005, now abandoned, which is a 
continuation of application U.S. Ser. No. 10/435,660, filed on to 
May 9, 2003, now U.S. Pat. No. 6,888,543, which claims 
priority to and the benefit of provisional application U.S. Ser. 
No. 60/452,857, filed on Mar. 7, 2003, and assigned to the 
assignee of the instant application. The above-identified 
applications are hereby incorporated by reference herein. 15 
This application incorporates by reference U.S. utility appli- 
cation Ser. No. 09/985,963, filed on Oct. 19, 2001, which 
claims priority to provisional application U.S. Ser. No. 
60/242,009, filed on Oct. 20, 2000, both of which assigned to 
the assignee of the instant invention. Both utility application 20 
Ser. No. 09/985,963 and provisional application Ser. No. 
60/242,009 are herein incorporated by reference. 

GOVERNMENT RIGHTS 

25 

This invention was made with U.S. Government support 
under Grant F4 9 620-01 -1-0564 awarded by the U.S. Air 
Force Office of Scientific Research; Cooperative Agreement 
NAG2-1501 awarded by the A m es Research Center of the 
National Aeronautics and Space Administration (NASA); 30 
and Grant CA 55833-09 awarded by the National Institute of 
Health (NIH). The government may have certain rights in the 
invention. 

FIELD OF THE INVENTION 35 

The invention relates generally to the field of informatics 
and more specifically to the field of informatic displays. 

BACKGROUND OF THE INVENTION 40 

The behavior of a laige number of interacting elements in 
a system is difficult to display, analyze and interpret. Many 
techniques have been attempted to derive an ordered model 
from the elements in such a system. An example of a system 45 
which produces such a large amount of data is the genome. 
Not only may the genome consist of a large number of genes 
(numbering in the tens of thousands for the human) but many 
genes of an organism interact. For example many genes exert 
control over other genes. That is many genes either induce or 50 
raise the expression level of other genes or inhibit or decrease 
the expression level of other genes. The act of just displaying 
the genes of a genome and their interactions, never mind 
analyzing such a large amount of data, is daunting. Since the 
genes exert control on each other, they do not change their 55 
expression levels independently, but instead form a genome- 
wide network of interactions. Similarly, proteins and metabo- 
lites and other cell constituents are part of a network of 
interactions. The consequence of this mutual control between 
different genes or molecules is that the dynamics of the 60 
molecular profiles are constrained to certain coherent, recur- 
ring patterns. 

Self Organizing Maps or SOMs have been used in an 
attempt to group genes according to their expression activity 
versus time profile. Under this technique, genes having simi- 65 
lar expression behavior are grouped together into clusters on 
a matrix of behaviors. The output of this algorithm is essen- 
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tially just the information about individual genes with regard 
to their assignment to one of these clusters . However, display- 
ing these genes as a matrix of expression behaviors provides 
a complicated array of graphs that does not help significantly 
with the analysis of the interactive gene behavior or with the 
interpretation of coherent patterns that emerge in the dis- 
played expression profiles. 

Similarly, other currently used analysis techniques for 
gene profiling, such as in hierarchical clustering, k-means 
clustering or principal component analysis, group genes into 
a small number of clusters relative to the total number of 
genes, and also fail to visualize patterns within the overall 
gene profile. 

In order to study the recurring patterns within genome- 
wide expression or molecular profiles, it is necessary to moni- 
tor the change of entire profiles at different times during a 
sequential process, or in response to multiple variables, such 
as during the longitudinal monitoring of multiple patients or 
of the biological responses of cells or tissues following treat- 
ment with various drugs. Such comparative time course 
analysis will generate data volumes comprised of three 
dimensions: (i) the elements of the molecular profile (e.g., the 
genes in gene expression profiles); (ii) the time points at 
which the profile is measured and (iii) the time course for each 
of the various processes studied. Existing gene clustering 
techniques are generally not capable of simultaneously moni- 
toring multiple dimensions, and hence a new method is 
required to visualize these global time-dependent changes in 
gene expression profiles. 

The present invention provides a method and apparatus to 
not only group genes according to activity but also to display 
the activity in such a way that reveals characteristic patterns in 
the monitored profiles, hence visualizing the underlying rela- 
tionship between the genes that comprise the regulatory net- 
work. It also provides a method to display the information of 
all three dimensions — gene, time and process — simulta- 
neously. 

SUMMARY OF THE INVENTION 

The invention relates to a method for displaying large 
amounts of information. The method includes the steps of 
forming a spatial layout of tiles each corresponding to a 
representative element; matching observed elements onto the 
spatial layout of tiles of representative elements; assigning a 
respective value to each respective tile of said spatial layout of 
representative elements; and displaying an image of the spa- 
tial layout of tiles of representative elements. Each tile 
includes atomic attributes of the representative elements. 

In one embodiment the step of forming a spatial layout of 
tiles includes the step of forming a self-organizing map. In 
another embodiment steps of forming a self organizing map 
include the steps of forming representative reference ele- 
ments, selecting a best unit matching centroid and adjusting 
the best unit matching centroid to form an adjusted best unit 
matching centroid. In still yet another embodiment the step of 
matching an observed element to a representative reference 
element includes the step of measuring the Euclidean dis- 
tance between the observed element and each representative 
reference element, and selecting the representative reference 
element having the least distance. 

The invention also relates to a method of displaying infor- 
mation. The method includes the steps of normalizing the 
data of observed elements; generating reference centroids, 
generating a map of the reference centroids; matching an 
observed element to a respective reference centroid and 
adjusting the respective reference centroid and the neighbor- 
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ing reference centroids. The method also includes the steps of 
iterating the above steps, modifying the strength of adjust- 
ment and the neighborhood radius to form a plurality of 
adjusted reference centroids, and matching each observed 
element to a respective adjusted reference centroid. The 
method also includes the steps of assigning a respective value 
for each sample, assigning a respective graphic representa- 
tion to each respective value and displaying a mosaic for each 
sample. In another embodiment the step of displaying is 
repeated to form a motion graphic. In yet another represen- 
tation the respective graphic is 3-D representation. 

The invention also relates to an apparatus for displaying 
large amounts of information. In one embodiment, the appa- 
ratus includes a tiler forming a spatial layout of tiles each 
corresponding to a representative element; a matcher match- 
ing observed elements onto said spatial layout of tiles of 
representative elements; an assigner assigning a respective 
value to each respective tile of said spatial layout of represen- 
tative elements; and a display displaying an image of the 
spatial layout of tiles of representative elements. Each tile 
includes atomic attributes of the representative elements. 

In one embodiment the tiler includes a self-organizing map 
processor. In another embodiment the assigner assigns a color 
to each respective tile. In yet another embodiment the display 
is a 3-D display. 

The invention also relates to an apparatus for displaying 
data. In one embodiment the apparatus includes a normalizer 
normalizing the data of observed elements, a reference cen- 
troid generator generating a plurality of reference centroids 
and a map of the reference centroids and a comparator match- 
ing an observed element to a respective reference centroid. In 
another embodiment a processor iteratively adjusts the 
respective reference centroid and neighboring reference cen- 
troids to form a plurality of adjusted reference centroids. The 
embodiment also includes a matcher matching an observed 
element to a respective adjusted reference centroid, a value 
assignor which assigns a respective value to each sample and 
a graphics engine assigning a graphic representation to each 
respective value and displaying a mosaic for each sample. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will be better 
understood by reference to the attached specification and 
drawings of which: 

FIG. 1 is a graphic representation of an embodiment of the 
method of the invention used display gene activation; 

FIG. 2 is a graphic representation of an embodiment of the 
method used to generate the self organizing map of FIG. 1; 

FIG. 3 is a graphic representation of the matching of gene 
profiles to the self organizing map of FIG. 2. 

FIG. 4 is a graphic representation of the results of an 
embodiment of the invention used to observe gene expression 
as a function of time and drug exposure. 

FIG. 5 is a graphic representation of the results of an 
embodiment of the invention used to observe gene expression 
as a function of time and drug exposure that displays a change 
in cell fate from a proliferative promyelocytic precursor cell 
to a differentiated neutrophil. 

FIG. 6 a is a series of mosaic representations, as produced 
by an embodiment of the invention, of gene expression in 
various normal human tissue types from sixteen different 
subjects; 

FIG. 6 b is a series of mosaic representations, as produced 
by an embodiment of the invention, of gene expression in 
various human tissue types obtained from sixteen subjects; 
and 
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FIG. 6c is a series of mosaic representations, as produced 
by an embodiment of the invention, of gene expression in 
various prostate disease states, obtained from eleven human 
subjects. 

5 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

Referring to FIG. 1, for the purpose of simplifying the 
10 explanation, the method of the invention is herein described 
using the specific example of analyzing the expression of a 
large number of genes, either from a number of tissue speci- 
mens 10, in response to a variety of drugs 14 or a time course 
of specific drugs 16. However this is for the purpose of 
15 example only and the method can be used to display a large 
number of elements (in the example shown, genes) either 
statically or over time. For example, the information dis- 
played could be a multitude of stocks as their prices vary with 
time; telephone numbers used as a function of time or any of 
20 a number of examples of systems having a large number of 
elements. 

In FIG. 1, we assume that there are six samples, generating 
six gene expression profiles to be analyzed 18. In this example 
there are potentially three possible scenarios as to the origin 
25 of the samples and associated study question. 

In the first scenario each sample represents a tissue speci- 
men (from different patients) that will be compared (one 
through six) 10. No time course is involved. In the second 
scenario, three drugs (A, B, and C) are compared; each drug 
30 being tested on two patients. Again, no temporal monitoring 
is involved, only the static outcome for each drug/patient is 
measured 14. In the third scenario, the six samples are derived 
from monitoring two time courses (A and B) with three time 
points (1 , 2 and 3 hr) each. The two time courses represent the 
35 response of a cell culture or patient to two different drugs, 
Drug A and Drug B. The gene expression in those tissues is 
examined through the use of a series of six micro -arrays. 

The technique is the same regardless of the scenario that 
generated the samples. In the example shown the expression 
40 level of sixteen genes is examined over a period of three hours 
after exposure to drug A or B 16. These gene expression data 
are collected (step 10) within a single matrix of data 22 
including all the data from all the microarrays. Importantly, 
even if the samples represent time courses from independent 
45 experiments or different patients, as is the case with the third 
example, they are entered in the same data matrix. This con- 
catenation of data, although counterintuitive, is essential to 
permit the invention to function when analyzing more than 
one sample class. Specifically, it enables the comparison 
50 between multiple time courses with respect to the individual 
elements (genes) and the patterns in the profile, in response to 
different perturbations. 

This concatenated data matrix is then used to generate (step 
14) a plurality of reference centroids on a spatial layout 
55 against which the observed gene expression values (element 
values) are compared. In this example the reference centroids 
(A through I) are generated using a Self Organizing Map 26 
(SOM) but any number of optimization methods that can 
generate centroids and place them on a layout that minimizes 
60 differences between neighboring centroids may be used. 
Alternative methods include, for example, energy minimiza- 
tion algorithms and force displacement algorithms. 

Referring to FIG. 2, to use a Self Organizing Map or SOM, 
the SOM must be trained against the set of existing data in 
65 order to generate the spatial layout of reference centroids. 
This training is accomplished by initially developing a set of 
reference centroids. Each reference centroid contains a list M 
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of expression values, one for each of the M samples. In FIG. 
1, there are six samples, thus the reference centroid is a list 
that contains 6 values. Every value of the reference centroid 
represents the value of a single gene in one sample, and this 
value is referred to as an atomic value, since it is associated 
with one sample and one centroid. In this example, a three by 
three grid 28 of nine reference centroids (A-I) is first created 
using random numbers to generate the nine profiles (step 18). 
The optimal total number of reference centroids (total num- 
ber of tiles per mosaic grid) can be determined empirically by 
the user and depends on the particular structure of the data. 
The grid size corresponds to the resolution of the analysis. 
The larger the grid (more tiles), the fewer average number of 
genes per tile and the finer grained the resolution. On smaller 
grids (fewer tiles) more genes are assigned to a tile and a 
‘blurry’, ‘pixelated’ resolution will result. 

In the analysis of gene expression data obtained in experi- 
ments with 10 to 100 sample microarrays, containing 1000 to 
60,000 genes, the total number of reference centroids is typi- 
cally 100-3000. The grid shape is preferably not a perfect 
square grid, in order to allow the mosaic image to orient itself 
properly on the mosaic. For example, for a grid with 600 
centroids, one may chose a grid of 20 by 30 tiles. One method 
which can be used to determine the optimal grid size is to first 
choose a desired average number of genes assigned to each 
tile, J, and then to divide the total number of genes by J. J is 
typically within the range of 1 to 15 genes however, this 
number may be varied as desired. 

Of the set of genes (one through sixteen) in the array 22, 
one gene (in this example gene nine) is randomly selected 
(step 22) and its time course or expression graph 32 is com- 
pared (step 26) to each of the centroids (A-I) of the set of 
reference centroids 28 until the best matching unit centroid 
(BMUC) is found (in this example reference centroid C). 
Which reference centroid is the best matching unit centroid or 
BMUC may be determined by any number of means includ- 
ing the minimum Euclidean distance between the gene 
expression graph and the reference centroid. Other measures 
of the best fit may include the Pearson Correlation between 
the gene expression graph 32 and the reference centroid graph 
(C); the use of common sets or the angular distance, to name 
but a few. 

Once the BMUC is found the BMUC is adjusted (step 30) 
to bring it closer to the shape of the gene expression graph 32. 
The BMUC 36 is not adjusted to exactly match the gene 
expression graph but instead to be closer to it by a certain 
amount (referred to as the strength of the adjustment). In the 
example shown, the first adjustment 36 is made to bring the 
BMUC to within eighty percent of the difference between the 
BMUC and the gene expression graph 32. The resulting graph 
is an adjusted BMUC C' 40 which looks more similar to the 
gene expression graph 32. Next the neighboring reference 
centroids (B, E and F) 44 are also adjusted (step 34) to match 
the gene expression graph albeit to a lesser strength of adjust- 
ment (in this example sixty percent). The resulting adjusted 
neighboring centroids are B’, E' and F' 44'. 

Although only nine centroids (A-I) are shown in this 
example, in an actual application potentially thousands of 
cells would appear in the array. The distance from the cell of 
interest, in this case (C'), to the cells in the neighborhood 
being adjusted, in this case B, E, F, is termed the radius and the 
distance to the furthest cells being adjusted is the boundary 
radius. In the example shown here the boundary radius is one, 
but in an actual case it would extend out to many cells. The 
radius will affect the patterns generated. A larger radius will 
create a more rigid map, i.e. large variations in activation of 
neighboring tiles will not be correctly displayed by the output 
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mosaic. A radius which is too small will fail to create visually 
coherent mosaics and clearly defined neighborhoods of simi- 
larly behaving genes. The user must find an appropriate value 
in between these two extremes. Typically, the initial radius is 
5 set to 30% of the map’ s smallest dimension. Therefore, a map 
which is 45 by 50 will have a radius of roughly 14. 

The strength of adjustment for each cell up to the boundary 
radius varies as a function of the radius value. The strength of 
adjustment function may be linearly decreasing with increas- 
10 ing radius or may take on some other function, set as expo- 
nentially decreasing. 

In this example, the only neighboring centroids adjusted 
are immediately adjacent to the BMUC. So in the example 
shown the radial distance is one and the strength of adjust- 
15 ment is sixty percent. Again, in a real experimental situation, 
as the radial distance is extended further from the BMUC, the 
strength of adjustment would be decreased. So for example at 
a radial distance of two the strength might be only forty 
percent and at a radial distance of three the strength might be 
20 only thirty percent and so on. 

Next another gene is randomly chosen (step 38) and the 
above described steps repeated. This iteration occurs many 
times using all the genes in the set over and over again. At 
each iteration the strength of adjustment for a given radius 
25 from each BMUC is decreased. Thus, in the example shown, 
at a radius of 1 the strength of adjustment was 80%, for the 
first iteration from the first BMUC and would be reduced for 
a subsequent BMUC to, for example, 78%, at a radius of one. 
Thus the strength of adjustment is decreased over and over 
30 again allowing the profile in each of the cells to converge to a 
final value. In this example the process is iterated 10,000 
times. Once all the iterations have taken place, the SOM is 
considered to be trained. Once the SOM has been trained the 
result is a set of adjusted reference centroids (A'-f) (FIG. 3). 
35 Because of the rapid convergence in the SOM algorithms, it is 
usually sufficient to run the SOM for a number of iterations 
equal to ten times the total number of genes in the input data 
matrix. Convergence is tested by increasing the number of 
iterations. Convergence is achieved when increasing the 
40 number of iterations does not significantly change the output 
mosaic patterns based on visual inspection. 

Referring now to FIG. 3, at this point each of the genes (one 
through sixteen) are compared against each of the adjusted 
reference centroids (A'-T) and grouped according to their best 
45 fit to one of the adjusted reference centroids (A'-F), similar to 
as was done during the generation of the adjusted reference 
centroids (A'-T). In the example shown, genes one, five and 
sixteen cluster in adjusted reference centroid A'; genes two, 
four and seven cluster in adjusted reference centroid F' and so 
50 on. In some cases, the adjusted reference centroid, for 
example adjusted reference centroid E', will have only one 
gene in the cluster (in this example gene three). At this point 
in the process all the genes are assigned to an adjusted refer- 
ence centroid and listed in a gene assignment list (Table 1). In 
55 addition a correspondence table (Table 2) contains the expres - 
sion values corresponding to each of the adjusted reference 
centroids at each point in time. The array of adjusted refer- 
ence centroids is called a mosaic and the location of each 
centroid in the mosaic is a tile. 

60 However, since each adjusted reference centroid is a vector 

containing the centroid expression values of the assigned 
genes in all the samples (in the example of FIG. 1 there are six 
samples), it cannot be easily visualized. Thus, the invention 
requires that each adjusted reference centroid is split back 
65 into its atomic values as described above reflecting the value 
of the assigned genes in each of the samples. The atomic 
values representing the same sample X in all the adjusted 
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reference centroids thus can be assigned to the tiles. The tiles 
with associated atomic values then form the mosaic that rep- 
resents that sample X. (FIG. 1, Table 2 and Table 3) By 
repeating the procedure for each sample, one obtains one 
mosaic for each sample 64 5 

It is important to remember that genes having similar 
expression profiles are grouped together in an adjusted refer- 
ence centroid tile and will have expression profiles similar to 
but not necessarily identical to the profile of the adjusted 
reference centroid at each point in time, as shown for example 1 0 
by gene 8 being matched to tile H\ 

It is possible to display the values in the adjusted reference 
centroids or tiles at each point in time in several ways. In one 
embodiment, each value or range of values may be assigned a 
color, representing an atomic value of the tile. For example 15 
the value three in the example shown can be assigned the 
color blue; the value four can be assigned the color green and 
so on. As a result, in this example, for values corresponding to 
the first hour of drug A 60, (listed in Table 2) adjusted refer- 
ence centroids B', D', E' and F' which have a value of 3 will be 20 
colored green, while adjusted reference centroids C' and H' 
which have a value of 4 will be colored yellow. 

Similarly, for values corresponding to the second hour of 
drug A, adjusted reference centroids A', C' and G’, which now 
have a value of 3, will be colored green while adjusted refer- 25 
ence centroids B' and E', which now have a value of 4, will be 
colored yellow. By displaying the values of the adjusted ref- 
erence centroids at each point in time (drug A one hour, drug 
B two hours etc.) on the centroid location map (Table 3) the 
result is a series of centroid maps 64, each corresponding to a 30 
different time value. If these maps are displayed sequentially 
a motion graphic is generated which shows the centroids 
changing color with time as the centroid value changes. 

In this example, the tile of the mosaic represent atomic 
gene expression values, i.e. each mosaic is one time point 35 
taken from a time course consisting of many time points. The 
tiles however can represent anything which is, or can be 
represented by, a single numeric value. For example, one may 
wish to display information about the tiles and the number of 
genes assigned to them, rather than just the expression level of 40 
the tile. 

Although this example has been described with various 
colors corresponding to various values of the reference cen- 
troids, other display methods are possible. For example vari- 
ous symbols can be used to denote various values instead of 45 
color. Alternatively if a 3-D graph is used, the height of the 
block representing each tile can represent a given value. It is 
also possible to have tones represent the various centroids, 
such that by placing a cursor over the centroid of interest, the 
tones generated over time will be indicative of the changes to 50 
the values of the centroid over time. In general the means for 
displaying the results are limited only by the imagination of 
the designer. 

Further, although each gene profile in this example is 
matched to the adjusted reference centroid based upon the 55 
corresponding profile of the centroid, what can be displayed 
as a result of this match is not so limited. Thus, in this 
example, although gene expression values were used to train 
the reference centroids and match the genes to the reference 
centroids, any property of the gene or of the centroid can be 60 
then displayed. So in this example, once the genes have been 
mapped to the tiles, other attributes of the genes can be dis- 
played that were not used in developing the spatial layout of 
the matrix. 

For example, one can display the number of genes assigned 65 
to each centroid to produce a density map . Gene attributes that 
can be mapped onto the tile, in addition to the expression 
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level, include but are not limited to: membership to certain 
gene function classes (gene ontology), to promoter sequence 
clusters, phenoclusters, the expression level of the protein 
encoded by the corresponding gene, or parameters from its 
position in the interactome, such as connectance or centrality. 

FIG. 4 is an example of the result of the method used to 
display the time course of gene activation in living cells 
exposed to various drugs or molecules, including CD40L, 
Anti-Ig, Terbutaline, Prostaglandin (PG) E, and Stromal 
derived factor (SDF)-l. In this case, about 13,000 genes were 
clustered into a rectangular mosaic grid with 2256 tiles com- 
prised of 47 columns by 48 rows. It is immediately visible, 
even without quantitative analysis that similar patterns of 
gene expression profiles are activated after 4 hrs exposure to 
CD40L and Anti-Ig on the one hand, and to Terbutaline, 
SDF-1 , and PGE on the other, indicating two distinct types of 
molecular mechanism that are utilized by these chemical 
agents. Note that the second group of ligands are known to 
similarly activate G-protein coupled receptors and hence, in 
fact do share a common molecular mechanism of action. 

By analyzing how gene expression profiles change over 
time, it is also possible to identify major shifts in the “state- 
space” (a high dimensional space in which each gene repre- 
sents a dimension) that may be indicative of attractor switches 
(a translocation of the gene expression in state space from one 
stable region to another mutually exclusive region) within the 
network. This would correspond to a significant all-or-none 
change in the functional activity of the entire network, as 
occurs during major changes in cell phenotype or fate, includ- 
ing switches between growth, apoptosis, and differentiation. 
For example, FIG. 5 shows two parallel time courses of HL60 
cells treated with two distinct drugs, DMSO andATRA. Both 
of these agents are known to cause terminal differentiation of 
HL60 cells into neutrophils. Major changes of both mosaics 
at 18-24 hr indicate a shift in state space, possibly associated 
with an attractor switch. This state space shift is quite distinct 
in the two processes, as evidenced by the disparate mosaic 
patterns at this time point. However, subsequent mosaics (48 
h-144 h) suggest that the responses converge in state space, 
since the overall mosaic patterns become more similar again. 
This indicates that the terminal differentiated state is an 
attractor state of the regulatory gene network. This signature 
of an attractor switch at 1 8-24 hr may be used to screen for 
chemical agents that produce distinct phenotypic responses 
without having to wait until the end of the entire process. 

It is also possible to compare gene expression profiles from 
similar tissues taken from various individuals to visualize 
which genes are characteristically activated in the same tissue 
types and hence provide a diagnostic read-out of tissue type. 
In FIG. 6a, gene expression profiles of about 7000 genes were 
generated for samples of various normal human tissues from 
two donors per tissue type and the resulting gene expression 
data graphically displayed according to the teachings of the 
invention. As can be readily seen, the graphic representations 
of the tissue type from both individuals are similar for the 
same tissue type but are different between tissue types. This 
indicates that the graphic display as produced by the inven- 
tion can be used to automatically determine tissue type from 
gene expression information. 

The visualization of tissue data can also be used to diag- 
nose diseased tissues. For example gene expression profiles 
may be obtained from tumor of the same organ in different 
individuals. These profiles may be compared to identify a 
tumor-specific gene profile pattern, and to determine from 
which tissues the tumors arose by comparing their gene 
expression patterns with profiles from various normal tissue 
types. In FIG. 6b, gene expression levels of 1 2626 genes were 
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visualized using the present technology in a human tumor and 
3 different types of normal tissues (1,2, and 3) that were 
obtained from sixteen individuals. Note first that the gene 
expression patterns within each group of the three tissue types 
and the tumor are similar. As can be seen, tissue type 1 has a 5 
triangular pattern of expression extending from the upper left 
to the lower right and upper right comers of the grid. Tissue 
type 2 has substantially a horizontal pattern of expression. 
Tissue type 3 has an expression pattern which is substantially 
a vertical pattern of expression. When the expression patterns to 
for the tumors are compared with these three tissue types, it is 
seen that the tumor also has an expression pattern that extends 
diagonally from the upper left to the lower right. Potentially 
this may imply that the tumors have arisen from tissue type 1 
and that the deviations in gene expression from the gene 15 
expression in tissue type 1 may be indicative of the tumor 
state. 

In addition it is possible to classify or stage tumors accord- 
ing to their profile. The tumors shown in FIG. 6 b are the same 
type of tumor as indicated by their generally similar profiles. 20 
Similarly, Tissue types 1, 2 and 3 each display patterns that 
are characteristic for their type. It is therefore possible to 
distinguish various tissues and tumors visually using this 
technique. In FIG. 6c, gene expression profiles of about 9000 
genes were generated for samples of prostate tissue removed 25 
from a normal region of a gland that also contains a tumor, 
benign prostatic hyperplasial tissue, prostate cancer tissue 
and metastatic prostate cancer tissue. Again the resulting gene 
expression data graphically are displayed according to the 
teachings of the invention. As can be seen, the mosaic repre- 30 
sentations of the four prostate tissue types not only vary 
between the normal and the diseased tissues, but the patterns 
in the diseased tissues are signatures of the type and stage of 
disease. This means that the graphic representations gener- 
ated by the invention can be used to provide a means for the 35 
automated diagnosis of tissue disease state. Such a diagnosis 
based upon the graphic representation of the invention will 
lead to low cost screenings of patients and rapid identification 
based on gene profiles without a requirement for gene iden- 
tification or clustering. 40 

The program used to perform the functions described is 
also capable of providing additional information to the user. 

So for example by clicking on a tile in the mosaic all the genes 
associated with that tile are listed. By selecting one of those 
genes, the program is able to link to other databases to provide 45 
information about the gene, gene products, disease states to 
name but a few. 

As discussed at the beginning of this specification, the 
examples discussed herein are related to gene expression 
levels. However, the technique may be applied to other large 50 
amounts of data, for example, financial data such as in the 
stock market. In this embodiment individual stocks take the 
place of genes and the stock price of the stock takes the place 
of the gene expression level of the gene. The analyses would 
proceed as before watching groups of stocks rising and falling 55 
in value just as we saw groups of genes being turned on or off. 
Clusters of stocks in the same industry may be expected to 
rise and fall together with news affecting the entire industry. 
Conversely a stock which is affected because of news specific 
to its underlying company may be seen to deviate from the 60 
industry cluster over time. 

Similarly, intelligence gathering may utilize the present 
invention in order to search for patterns of communication 
which might indicate some form of threat. For example, one 
could examine phone call patterns of designated individuals 65 
using the number called, duration of call, or frequency of calls 
as an attribute for each member of a phone network. Groups 
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of individuals who are interacting would show clustering and 
the increase in call length or frequency could be related to 
some form of threat, just as the amount of communication or 
“chatter” is used by the intelligence agencies to determine 
potential threat levels at present. In addition, it may be pos- 
sible to identify critical “nodes” from which most informa- 
tion flows outwards (e.g., as would be expected from leaders 
of a terrorist cell) and to discriminate them from receivers or 
followers. Most importantly, this technology can identify 
major shifts in the “state-space” that are indicative of attractor 
switches within the network; this would correspond to a sig- 
nificant all-or-none change in the functional activity of the 
entire network, as one might expect before a major terrorist 
incident. Note that this is distinct from the level of activity 
within all of the individual connections, as is currently used in 
information gathering. 

The invention may be embodied in other specific forms 
without departing from the spirit or essential characteristics 
thereof. The foregoing embodiments are therefore to be con- 
sidered in all respects illustrative rather than limiting on the 
invention described herein. Scope of the invention is thus 
indicated by the appended claims rather than by the foregoing 
description, and all changes which come within the meaning 
and range of equivalency of the claims are intended to be 
embraced therein. 

What is claimed is: 

1. A method of diagnosing prostate cancer in a prostate 
tissue sample comprising: 

forming a spatial layout of tiles, each tile corresponding to 
a representative reference element in a mosaic; 
mapping observed elements of a prostate tissue sample 
onto the spatial layout of tiles of representative reference 
elements, the observed elements comprising gene 
expression levels in a prostate tissue sample; 
assigning a respective value to each respective tile of the 
spatial layout of representative reference elements; and 
displaying, using a graphics engine, an image of the spatial 
layout of tiles of representative reference elements, 
wherein each tile comprises atomic attributes of representa- 
tive elements. 

2. The method of claim 1, wherein the prostate tissue 
sample comprises a prostate tumor. 

3. The method of claim 2, comprising classifying the stage 
of the prostate tumor based on the displayed image. 

4. A method of diagnosing prostate cancer comprising: 
forming a spatial layout of tiles, each tile corresponding to 

a representative reference element in a mosaic; 
mapping observed elements onto the spatial layout of tiles 
of representative reference elements, the observed ele- 
ments comprising gene expression levels in a prostate 
tissue sample; 

assigning a respective value to each respective tile of the 
spatial layout of representative reference elements; and 
displaying, using a graphics engine, an image of the spatial 
layout of tiles of representative reference elements, 
wherein each tile comprises a visual representation of the 
respective value of the number of observed elements. 

5. The method claim 4 wherein forming the spatial layout 
of tiles, each tile corresponding to the representative refer- 
ence element in the mosaic, comprises forming a self-orga- 
nizing map. 

6. The method of claim 5 wherein forming a self-organiz- 
ing map comprises: 

a) forming representative reference elements; 

b) matching an observed element to a respective represen- 
tative reference element to select a best unit matching 
centroid; 
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c) adjusting the best unit matching centroid to form an 
adjusted best unit matching centroid; and 

d) iterating steps b and c a plurality of times for each 
observed element. 

7. The method of claim 6 wherein matching the observed 5 

element to the respective representative reference element 
comprises measuring the Euclidean distance between the 
observed element and each representative reference element; 
and selecting the representative reference element having the 
smallest Euclidean distance. 10 

8. The method of claim 6 wherein matching the observed 
element to the respective representative reference element 
comprises measuring a correlation between the observed ele- 
ment and each representative reference element; and select- 
ing the representative reference element having the highest 15 
correlation. 

9. The method of claim 6 wherein adjusting the best unit 

matching centroid comprises adjusting the value of the rep- 
resentative reference element to form an adjusted best unit 
matching centroid. 20 

10 . The method of claim 4 wherein assigning the respective 
value to each respective tile of the spatial layout of represen- 
tative reference elements comprises assigning a color to each 
respective tile in response to the value of the tile. 

11 . The method of claim 4 wherein assigning the respective 25 
value to each respective tile of the spatial layout of represen- 
tative reference elements comprises assigning a height to 
each respective tile in response to the value of the tile in a 
three-dimensional graph. 

12. The method of claim 4 wherein the respective value of 30 
the respective tile is a value associated with the respective 
reference element. 

13 . The method of claim 4 wherein assigning the respective 
value to each respective tile of the spatial layout of represen- 
tative reference elements comprises assigning, to each 35 
respective tile in response to the value of the tile, a property 
selected from the group consisting of symbol shape, size, hue 
and pitch. 
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14. The method of claim 4 further comprising concatenat- 
ing data comprising observed elements. 

15. The method of claim 4, wherein the observed elements 
comprise gene expression levels in at least one prostate tissue 
sample from a patient and gene expression levels in at least 
one normal prostate tissue sample. 

16. A method of diagnosing prostate cancer by displaying 
data obtained from observed elements in a prostate tissue 
sample, the method comprising: 

a) normalizing data of observed elements, the observed 
elements comprising gene expression levels in a prostate 
tissue sample; 

b) generating reference centroids; 

c) generating a map of the reference centroids; 

d) matching an observed element to a respective reference 
centroid; 

e) adjusting the respective reference centroid and neigh- 
boring reference centroids within a neighborhood 
radius; 

f) iterating steps e and f using a processor, modifying the 
strength of adjustment and the neighborhood radius to 
form a plurality of adjusted reference centroids; 

g) matching each observed element to a respective adjusted 
reference centroid of the plurality of adjusted reference 
centroids; 

h) assigning a respective value for each adjusted reference 
centroid; 

i) assigning a respective graphic representation to each 
respective value; 

j) displaying a mosaic of each graphic representation; and 

k) determining whether the prostate tissue sample is nor- 
mal or diseased based on the displayed mosaic. 

17. The method of claim 16 further comprising iterating 
step j) to form a motion graphic. 



